Fall 2018

Geocoding in R

Introductions

About me

About you - why are you here?

Workshop Overview

  • Intro to Geocoding
  • Review of Geocoding tools
  • Geocoding in R with ggmap
  • Geocoing in R with ESRI World Geocoding Service

What is Geocoding

Obtain geographic coordinates for a named place, street address, or zip code.

Types of places you can Geocode

  • state, city, building,
  • street address or intersection,
  • named place (eg mountain), landmark,
  • crime or other event location,
  • zip code, etc.

Why Geocode?

  • Display locations on a map

  • Link locations to other data

  • Spatial analysis
    • Calculate distance, direction, area, etc.
    • Identify patterns & relationships:
      • clusters, outliers, neighbors

Geocoding Components

File of Input locations

  • places of the same type in a standard format

A Geocoder

  • Software and a reference database
  • The software compares input locations to database & returns matches

Output file

  • Geographic coordinates for each matched input location
  • Metadata about the match

Reference Databases

What types of data are available/ should be used for matching

  • Rooftop
  • Parcel
  • Street Addresses
  • Street Address Range
  • Street, City, Zipcode, State

Dual Address Range

Where do you think the point for 275 5th St NE would be located?

Dual Address Range

Where do you think the point for 275 5th St NE would be located?

Parcel & Rooftop Reference Database

Addresses would match to centroid of parcel or rooftop

Reference Database Quality

Geocode with maps.google.com

google maps

Try entering a city, landmark (Sather Gate), zipcode, address, etc.

Geocoding Workflow

  • Consider your needs & options
  • Choose geocoder
  • Preprocess / clean your input data
  • Geocode
  • Evaluate your output
  • Postprocess as needed

Considerations

  • Geographic & temporal scope
  • Output type and quality
  • Volume / speed
  • Security
  • Costs of software and data
  • Ease of use

Geocoding Options

Local Geocoding

Needed when

  • Input data are confidential or too large

You can use ESRI ArcGIS/ArcPro with local reference database

- you need to install local reference database
- limited to USA & Canada
- needs to be updated manually

Available in the D-Lab

Geocoding Practice

Open these geocoding tools in 3 different browser windows

Compare results

Geocode the following address in all three tools:

2625 Dana St, Berkeley, CA, 94704

Geocoding in R

Geocoding with ggmap

This has been the most common R package for geocoding.

But, beginning July 2018 Google changed its pricing model.

For API access you need to register a credit card with Google

  • $200 worth of free API services per month

  • That's 40,000 geocoded locations

See the Google Geocoding API Documentation for details

Geocoding with ggmap

Worth the trouble because output is great quality.

But only if you can afford the costs!

Let's try it.

ggmap

Load the library

library(ggmap)
## Loading required package: ggplot2
## Google Maps API Terms of Service: http://developers.google.com/maps/terms.
## Please cite ggmap if you use it: see citation("ggmap") for details.

ggmap

Register your API key

#mykey <- "AIzaSyxxxxxxxxxxxxxxxxxxxxxxxxxOQyOFWrTw"
mykey <- "AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw"
register_google(key=mykey)

Geocode a city

geocode("San Francisco, CA")

Geocode a state

geocode("California")

Geocode a landmark

geocode("Golden Gate Bridge")

More detailed output

geocode("Golden Gate Bridge", output="more")

Reverse Geocode

revgeocode(c(-122.4194,37.77493), output="more")

Geocode an address

geocode_ouput <-geocode("2625 Dana St, Berkeley CA, 94704", output="more")

"Batch" Geocode

Geocode a set of places at once.

Usually means you upload the entire file to the geocoder.

This is much faster but not supported by most online services

With ggmap::geocode we serial process the set in one command.

Read in file of addesses

# File of addresses
address_data <- read.csv("address_data/oak_liquor_stores.csv", stringsAsFactors = F)

# Take a look
head(address_data)
##   id               name              street     city state   zip type
## 1  1    Wah Fay Liquors        2101 8th Ave  Oakland    CA 94606    p
## 2  2      Vision Liquor 1615 Macarthur Blvd  Oakland    CA 94602    p
## 3  3    Souza's Liquors         394 12th St  Oakland    CA 94607    p
## 4  4         Tk Liquors       1500 23th Ave  Oakland    CA 94606    p
## 5  5 Quadriga Wines Inc   6193 Ridgemont Dr  Oakland    CA 94619    p
## 6  6             Bev Mo   525 Embarcadero W  Oakland    CA 94607    c

Format addresses

Many online services want a single column with the address data, like "2625 Dana St, Berkeley CA, 94704"

address_data$full_address <- paste0(address_data$street, ", " , 
                                    address_data$city, ", " , 
                                    address_data$state, " ", 
                                    address_data$zip)

# Take a look
head(address_data, 3)
##   id            name              street    city state   zip type
## 1  1 Wah Fay Liquors        2101 8th Ave Oakland    CA 94606    p
## 2  2   Vision Liquor 1615 Macarthur Blvd Oakland    CA 94602    p
## 3  3 Souza's Liquors         394 12th St Oakland    CA 94607    p
##                             full_address
## 1        2101 8th Ave, Oakland, CA 94606
## 2 1615 Macarthur Blvd, Oakland, CA 94602
## 3         394 12th St, Oakland, CA 94607

Ready to geocode

google_geocoded <- geocode(address_data$full_address, output = "more", 
                           source = "google", key=mykey)
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=2101%208th%20Ave%2C%20Oakland%2C%20CA%2094606&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Warning: package 'bindrcpp' was built under R version 3.4.4
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1615%20Macarthur%20Blvd%2C%20Oakland%2C%20CA%2094602&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=394%2012th%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1500%2023th%20Ave%2C%20Oakland%2C%20CA%2094606&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=6193%20Ridgemont%20Dr%2C%20Oakland%2C%20CA%2094619&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=525%20Embarcadero%20W%2C%20%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=5403%20Foothill%20Blvd%2C%20Oakland%2C%20CA%2094601&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1200%2078th%20Ave%2C%20Oakland%2C%20CA%2094621&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=828%20Franklin%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=5913%20International%20Blvd%2C%20Oakland%2C%20CA%2094621&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3210%20Harrison%20St%2C%20Oakland%2C%20CA%2094611&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1460%207th%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1333%20Peralta%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3710%20Telegraph%20Ave%2C%20Oakland%2C%20CA%2094609&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3293%20Lakeshore%20Ave%2C%20Oakland%2C%20CA%2094610&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=1647%208th%20St%2C%20Oakland%2C%20CA%2094607&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3849%20Martin%20Luther%20King%20Jr%20Way%2C%20Oakland%2C%20CA%2094609&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=3900%20Grand%20Ave%2C%20Oakland%2C%20CA%2094610&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=7305%20Edgewater%20Dr%20%23D%2C%20Oakland%2C%20CA%2094621&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=%20350%20E%2018th%20St%2C%20Oakland%2C%20CA%2094606&key=AIzaSyDf-SZG8O4hj1c06VQ-k6hkBrOQyOFWrTw

View results

Google returns a lot of metadata - see documentation

head(google_geocoded)
##         lon      lat           type loctype
## 1 -122.2449 37.79836        premise rooftop
## 2 -122.2237 37.80033 street_address rooftop
## 3 -122.2705 37.80262 street_address rooftop
## 4 -122.2350 37.78426 street_address rooftop
## 5 -122.1672 37.78434        premise rooftop
## 6 -122.2791 37.79590 street_address rooftop
##                                        address    north    south      east
## 1         2101 8th ave, oakland, ca 94606, usa 37.79972 37.79702 -122.2435
## 2  1615 macarthur blvd, oakland, ca 94602, usa 37.80168 37.79898 -122.2223
## 3          394 12th st, oakland, ca 94607, usa 37.80397 37.80127 -122.2692
## 4        1500 23rd ave, oakland, ca 94606, usa 37.78561 37.78291 -122.2336
## 5    6193 ridgemont dr, oakland, ca 94619, usa 37.78569 37.78299 -122.1658
## 6 525 embarcadero west, oakland, ca 94607, usa 37.79725 37.79455 -122.2778
##        west street_number               route locality
## 1 -122.2462          2101          8th Avenue  Oakland
## 2 -122.2250          1615 MacArthur Boulevard  Oakland
## 3 -122.2719           394         12th Street  Oakland
## 4 -122.2363          1500         23rd Avenue  Oakland
## 5 -122.1685          6193     Ridgemont Drive  Oakland
## 6 -122.2805           525    Embarcadero West  Oakland
##   administrative_area_level_2 administrative_area_level_1       country
## 1              Alameda County                  California United States
## 2              Alameda County                  California United States
## 3              Alameda County                  California United States
## 4              Alameda County                  California United States
## 5              Alameda County                  California United States
## 6              Alameda County                  California United States
##   postal_code postal_code_suffix       neighborhood subpremise
## 1       94606               2007               <NA>       <NA>
## 2       94602               1606           Glenview       <NA>
## 3       94607               4249   Downtown Oakland       <NA>
## 4       94606               5035 Rancho San Antonio       <NA>
## 5       94619               3724      Caballo Hills       <NA>
## 6       94607               3565   Downtown Oakland       <NA>

Add Geocoded results to input file

# Add geocoded points to input data frame
# - works because num rows & row order returned by geocoder the same
address_data$glon <- google_geocoded$lon
address_data$glat <- google_geocoded$lat

head(address_data, 3)  # take a look
##   id            name              street    city state   zip type
## 1  1 Wah Fay Liquors        2101 8th Ave Oakland    CA 94606    p
## 2  2   Vision Liquor 1615 Macarthur Blvd Oakland    CA 94602    p
## 3  3 Souza's Liquors         394 12th St Oakland    CA 94607    p
##                             full_address      glon     glat
## 1        2101 8th Ave, Oakland, CA 94606 -122.2449 37.79836
## 2 1615 Macarthur Blvd, Oakland, CA 94602 -122.2237 37.80033
## 3         394 12th St, Oakland, CA 94607 -122.2705 37.80262

Map results with Leaflet

library(leaflet)

point_map <- leaflet(address_data) %>%
  addTiles() %>%   
  addMarkers(lat=~glat, lng=~glon, 
            popup=(paste0(address_data$name, "</br>",
                          address_data$full_address)
            )
  )

Display the map

point_map

ESRI World Geocoding Service

ESRI World Geocoding Service

UC Berkeley has an ESRI Site licence

So we can use the World Geocoding Service for free.

This is a great value as geocoding is expensive!

ESRI World Geocoding Service

In order to access the World Geocoding Service you need to first get an API key.

ESRI calls this an access token.

Instructions for this are on page 15 of this document.

Let's try it!

ESRI WGS Access Token

You can use the token below for this workshop.

Tokens are temporary

  • By default, a token expires in 2 hours.
  • Add an expiration parameter when requesting a token to receive one that is valid for a shorter or longer period of time.
  • The maximum lifetime of a token is 14 days even if you specify a longer expiration period.

ESRI WGS Token

ESRI WGS Token

Your token should look something like this:

my_esri_token <- "iNFc0fq6bGlj6mUfptHgEGEjVeKZ0T5EPNAkVjgxBotwDaXxgykmwDNR0AkEvPi8Sh0QESWgt0IhmsbbO5UA3H-b2_T5yzhn1fW--WwD2O-7V2gbGCEaF4jg38dUQ7LhpTBT_XqYy0vV7Mw_e2hWUA.."

Functions for Geocoding with the ESRI WGS

Two functions: geocode_one and geocode_many

These functions are long and ugly.

You can reference them by sourcing this script.

Take a look at the file on your computer

source("./scripts/esri_wgs_geocoding.R")

Geocode One Address

geocode_one("2625 Dana St, Berkeley, CA, 94704", my_esri_token, 
            postal = TRUE)
## Loading required package: httr
##         lon      lat score locName status
## 1 -122.2601 37.86202   100   World      M
##                                   matchAddr side  addressType
## 1 2625 Dana St, Berkeley, California, 94704    R PointAddress

Geocode Many

# Batch geocode your dataframe of addresses with the following function
esri_geocoded <- geocode_many(address_data$id, address_data$street, 
                              address_data$city, address_data$state, 
                              as.character(address_data$zip), my_esri_token)
## Loading required package: rjson

Take a look at results

ESRI WGS also returns a lot of metadata

head(esri_geocoded, 3)
##   ID       lon      lat score locName status
## 1  2 -122.2236 37.80054   100   World      M
## 2  4 -122.2351 37.78423   100   World      M
## 3  1 -122.2448 37.79827   100   World      M
##                                         matchAddr side   addressType
## 1 1615 MacArthur Blvd, Oakland, California, 94602    L  PointAddress
## 2       1500 23rd Ave, Oakland, California, 94606    R StreetAddress
## 3        2101 8th Ave, Oakland, California, 94606    L  PointAddress

Add ESRI WGS Results to input data

address_data <- merge(address_data, esri_geocoded[c("ID","lon","lat")], 
                      by.x="id",by.y = "ID", all.x = T)

# Take a look
head(address_data, 3)
##   id            name              street    city state   zip type
## 1  1 Wah Fay Liquors        2101 8th Ave Oakland    CA 94606    p
## 2  2   Vision Liquor 1615 Macarthur Blvd Oakland    CA 94602    p
## 3  3 Souza's Liquors         394 12th St Oakland    CA 94607    p
##                             full_address      glon     glat       lon
## 1        2101 8th Ave, Oakland, CA 94606 -122.2449 37.79836 -122.2448
## 2 1615 Macarthur Blvd, Oakland, CA 94602 -122.2237 37.80033 -122.2236
## 3         394 12th St, Oakland, CA 94607 -122.2705 37.80262 -122.2705
##        lat
## 1 37.79827
## 2 37.80054
## 3 37.80238

Map geocoded data with Leaflet

point_map <- leaflet() %>%
  addTiles() %>%   
  addMarkers(lat=address_data$glat, lng=address_data$glon, 
            popup=(paste0(address_data$name, "</br>",
                          address_data$full_address)) 
            ) %>%   
  addCircleMarkers(lat=address_data$lat, lng=address_data$lon,
            color="black",fillColor="red",
            popup=(paste0(address_data$name, "</br>",
                          address_data$full_address))
  )

Display the map

point_map

Geocoding Historic Locations

Good question!

Save output to file

You don't want to re-geocode!

write.csv(google_geocoded,file="output/address_data_geocoded_google.csv", 
          row.names=FALSE)

write.csv(esri_geocoded,file="output/address_data_geocoded_esri.csv", 
          row.names=FALSE)

Geocoding Results

Assessing Output Quality

  • Map the results

  • Examine the range of coordinates
    • CA: -124, 32, -114, 42
  • Review output data & metadata
    • specific to the geocoder

Geocoding Output

  • With a little preprocessing most reliable geocoders will be able to geocode 80% or more of your addresses within a street block of the actual location.
    • based on my my experience!
    • assumes US addresses.
  • Cleaning and standardizing addresses can be a lot of work!
    • unlikely to get it perfect

Standardize Addresses

  • provide all components
  • remove unnecessary components
  • remove duplicates
  • remove extra spaces or commas
  • remove odd characters like “#” “/”, “@”
  • standardize capitalization

Standardize Addresses

  • Intersections
    • Corner of Main and Long Ave should be Main & Long
  • Numbered Streets
    • Fourth St should be 4th St
  • Directional Prefixes
    • North, No, N., etc should be N
  • Apartment numbers and letters
    • Remove them!
  • Use PO Box
    • unless you have the address!

Use Standard Abbreviations

Use For These
HWY Highway
LN Lane
DR Drive
EXPY Expressway

Problems in Reference Database

  • Incorrect street ranges
  • Inaccurate or low quality features
  • Inaccurate feature attributes

  • Missing streets
  • Address changes

Output Quality & Population Density

lonlat

Questions?

Linking to Census Data

Linking to Census Data

The US Census Bureau has been collecting data about the US population since 1790!

Survey data from the decennial census and the 5 year American Community Survey are two of the most commonly used databases.

Publicly available census data are aggregated to census geographies.

Census Geographies

Census Data

Census data are linked to Census geographies by IDs called FIPS Codes or GEOIDs

Census FIPS Codes / GEOIDs

Linking points to census data

In order to link geocoded points to census data you need to get the census geographic FIPS codes for the points.

There are a number of ways to do this.

Important Note

Census geographies can change every 10 years due to changes in the population.

For example,

  • if the population of an area increases, census tracts can split

  • if the population of an area decreases, tracts can merge

This makes comparisons between censuses more complicated.

  • You need to get the correct census identifiers for the year(s) of interest

US Census Geocoding Service

If you geocode with the US Census Geocoding Service you can get the census IDs from the geocoder.

Try that with "2625 Dana St, Berkeley, CA, 94704"

FCC Census Block API

FCC Block API

Source the script file to load the function

latlon2fips

Then try it with -122.273634, 37.852562

source("./scripts/fcc_latlon2fips.R")

# test one coordinate pair
latlon2fips(latitude=37.852562, longitude=-122.273634)
## [1] "060014240011002"

Get FIPs for all of our geocoded addresses

Apply the latlon2fips function to all rows in our dataframe

Use the coordinates returned by the Google Geocoding API

address_data$fips<- mapply(latlon2fips, address_data$glat, 
                           address_data$glon)

# Take a look
head(address_data, 3)
##   id            name              street    city state   zip type
## 1  1 Wah Fay Liquors        2101 8th Ave Oakland    CA 94606    p
## 2  2   Vision Liquor 1615 Macarthur Blvd Oakland    CA 94602    p
## 3  3 Souza's Liquors         394 12th St Oakland    CA 94607    p
##                             full_address      glon     glat       lon
## 1        2101 8th Ave, Oakland, CA 94606 -122.2449 37.79836 -122.2448
## 2 1615 Macarthur Blvd, Oakland, CA 94602 -122.2237 37.80033 -122.2236
## 3         394 12th St, Oakland, CA 94607 -122.2705 37.80262 -122.2705
##        lat            fips
## 1 37.79827 060014055003004
## 2 37.80054 060014049003026
## 3 37.80238 060014030001005

Get FIPS spatially

We can download geographic data from the Census

and then spatially join the geocoded points to the census geographic data

to get the FIPS codes for each point

Get FIPS spatially

We can use the tigris package to download census geographic data.

library(sp)
## Warning: package 'sp' was built under R version 3.4.3
library(tigris)
## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
## 
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
## 
##     plot

tigris

Set tigris options & download data

options(tigris_class = "sp")   # options are sp or sf
options(tigris_use_cache = F)  # set to true to save locally

tracts2010 <- tracts(state = '06', county= '001', cb = F, year=2010)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |=========                                                        |  15%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |==============================================                   |  72%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%
tracts2000 <- tracts(state = '06', county= '001', cb = F, year=2000)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |=========                                                        |  15%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |==============================================                   |  72%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |===========================================================      |  92%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%

Plot the tracts

plot(tracts2000, border="red")
plot(tracts2010, add=T)

Make address points spatial

Use the sp library to convert the address_data to a SpatialPointsDataFrame

Then, set the coordinate reference system, or CRS to be the same as that of the census tracts.

address_data_sp<- address_data  #make copy
coordinates(address_data_sp) <-c("glon", "glat")
proj4string(address_data_sp) <- CRS(proj4string(tracts2010))

A word about Geographic Coordinates

The Google Geocoding API returns geographic coordinates referenced to the World Geodetic System of 1984 or WGS84.

Census geographic data are referenced to the North American Datum of 1983 or NAD83.

These two geographic CRSs can differ by up to 1 meter in the continental USA.

It's up to you to decide if this could impact your spatial analysis.

Plot the tracts and points

plot(tracts2000, border="red")
plot(tracts2010, add=T)
points(address_data_sp, col="green")

Spatial Overlay

The sp::over function spatially compares the two input spatial objects and returns the data values for the census tract in which each point is located.

fips2000 <- over(address_data_sp, tracts2000)
fips2010 <-over(address_data_sp, tracts2010)

# Uncomment to take a look at the output
#head(fips2000, 3)
#head(fips2010, 3)

Add the FIPS to the address dataframe

address_data$GEOID10 <- fips2010$GEOID10
address_data$CTIDFP00 <- fips2000$CTIDFP00

# head(address_data)

Questions?

Discuss the 3 methods for getting the census FIPS codes for the geocoded points.

  1. When geocoding

  2. Using an API

  3. Using spatial overlay

Fetching Census data

Fetching Census data

We can take this one step further and retrieve census data for the census tracts of interest.

We do this with the tidycensus package

Get Census Data

library(tidycensus)
## Warning: package 'tidycensus' was built under R version 3.4.3
# Load census key - Patty's key, apply for your own!
my_census_api_key <- "f2d6f4f743545d3a42a67412b05935dc7712c432"

# Make tigris aware of census key
census_api_key(my_census_api_key)
## To install your API key for use in future sessions, run this function with `install = TRUE`.

Identify state(s) and county(s) of interest

my_states<- c("06") # CA
my_counties <- c("001") # Alameda County

Identify ACS census variables

Our scenario of interest is to identify liquor stores in census tracts with a relatively high population of minors.

So, we want to find the census variables for total population and population under 18

cenvar_table <-load_variables(year=2016, dataset = "acs5", cache=T)

# Take a look at the data frame to find the names of census tables
# This will give you the table and variable codes
# Use "View(cenvar_table)" and filter within the table to get the codes for a specific variable

# View(cenvar_table) 

Census Variables

Indentify the tables and variables of interest in the 5 year ACS Data (2012-2016)

pop_total <- "B01001_001E"
pop_under18 <- "B09001_001E"

Fetch Data of interest

pop_acs5_2016 <-get_acs(geography = "tract", 
                        variables = c(pop_total,pop_under18), 
                       year=2016, survey="acs5",
                       state = my_states, county = my_counties,
                       geometry = F)
## Please note: `get_acs()` now defaults to a year or endyear of 2016.
#head(pop_acs5_2016)

Reshape the data

Load libraries

library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Reshape the data

# Select the columnbs of interest
# and put `totpop` and `under18` in their own columns
pop2 <- pop_acs5_2016 %>%
  select("GEOID","variable","estimate") %>%
  spread(key=variable, value=estimate)

# Rename columns
colnames(pop2)<-c("GEOID10","totpop","under18")
head(pop2)
## # A tibble: 6 x 3
##   GEOID10     totpop under18
##   <chr>        <dbl>   <dbl>
## 1 06001400100  3018.    387.
## 2 06001400200  1960.    335.
## 3 06001400300  5236.    922.
## 4 06001400400  4171.    655.
## 5 06001400500  3748.    500.
## 6 06001400600  1661.    233.

Join population data to Address data

pop2$pct_under18 <- round((pop2$under18 / pop2$totpop) * 100, 1)
address_data2 <- merge(address_data, pop2, by="GEOID10", all.x=T)

# Take a look - what do you think?
#head(address_data2)

Join Population data to Census tracts to map

We just joined the census data to the address points.

We can also join it to the census tracts.

tracts2010 <- merge(tracts2010, pop2, by="GEOID10")

# Take a look:
# head(tracts2010@data)

Note: this is sp::merge so we do not reference the @data slot!

Map the data

quantColors <- colorQuantile("Reds", tracts2010$pct_under18, n=5)

point_map <- leaflet() %>%
  addTiles() %>%   
  addPolygons(data=tracts2010, 
              color="white", 
              weight=1, 
              opacity=0.5,
              fillColor= ~quantColors(pct_under18),
              fillOpacity = 0.75, 
              popup = paste0("<b>Percent under 18:</b> ", tracts2010$pct_under18, "%")) %>%
  addMarkers(data=address_data, lat=~glat, lng=~glon, 
            popup=(paste0(address_data$name, "</br>",
                          address_data$full_address)
            )
  )

Display the map

point_map

Save the output data

#write.csv(address_data2, file="output/address_data_geocoded2.csv", row.names=F)
#writeOGR(tracts2010, dsn="./output", layer="tracts2010", driver="ESRI Shapefile")

Questions?

Getting Help

  • Review the documents and websites linked in this tutorial

  • Sign up for a consult!

Thank You!